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ABSTRACT 

In this paper, we describe the experiences of the Auto- 
mated Software Engineering Group at the NASA Ames 
Research Center in the development and application 
of three different transformation systems. The sys- 
tems span the entire technology range, from deduc- 
tive synthesis, to logic-based transformation, to almost 
compiler-like source-to-source transformation. These 
systems also span a range of NASA applications, includ- 
ing solving solar system geometry problems, generat- 
ing data analysis software, and analyzing multithreaded 
Java code. 
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1 INTRODUCTION 

In this paper, we describe the experiences of the Auto- 
mated Software Engineering Group at the NASA Ames 
Research Center in the development and application of 
three different transformation systems. The systems 
span the entire technology range from deductive synthe- 
sis, to logic-based transformation, to almost compiler- 
like source-to-source transformation. 

Deductive synthesis systems [11] have the same goal 
as “classical” (i.e., non-deductive) transformation sys- 
tems, namely the (mostly automatic) translation of 
non-executable specifications into executable programs. 
They use, however, a different technology. The specifi- 
cation is recast as a theorem in the first-order predicate 
calculus and submitted to an automatic theorem prover. 
The resulting program is then extracted from the proof 
which must thus be constructive. 

Deductive synthesis is not an alternative to but an in- 
stance of the general transformational software devel- 


opment paradigm. This is a consequence of the dual 
nature of proofs. On the one hand, a proof, especially 
a tableau-style proof [12], is a stepwise transformation 
of a theorem into axioms. On the other hand, it can 
be considered as a program, due to the Curry-Howard 
isomorphism [7]. 

The main difference between the classical transforma- 
tion and deductive synthesis approaches is the question 
of control , i.e., who is in charge to control the trans- 
formation/proof process? In a classical transformation 
system, the system designer is in charge. Usually, the 
transformation process is constrained by organizing the 
set of transforms into different, fixed, mutually distinct 
layers; these layers are executed in a fixed order and 
transforms in later layers are generally more detailed 
and more localized. In a purely deductive system, the 
theorem prover is in charge. The proof process is only 
restricted by properties of the applied calculus; essen- 
tially, every transform can be applied at every possible 
locale, at any time. Hence, a transformation system 
may in turn also be considered to be an operationaliza- 
tion of a deductive synthesis system. 

A syntax-directed source-to-source transformation sys- 
tem (in other words, a compiler) can be considered 
as the ultimate operationalization of a transformation 
system: the search for applicable transformations (i.e., 
transforms and locales) is completely eliminated, either 
by differentiation of the syntax which renders the choice 
of locales irrelevant, or by completion of the set of trans- 
forms which renders the choice of transforms irrelevant. 

The application of transformational techniques at 
NASA is driven by the need to achieve goals in soft- 
ware development: (i) lowering the development cost, 
{ii) lowering the required expertise, and (in) validating 
and verifying the developed system. These goals, how- 
ever, also apply to the development of transformation 
systems. 

The only variable cost in the development of a trans- 
formation system is the operationalization step. Since 
deductive synthesis systems need in principle no explicit 
operationalization, they are prime candidates. However, 




non-operationalized deductive systems, hit the com- 
plexity barrier inherent to theorem proving before they 
scale-up to non-toy problems. We have thus developed 
automated techniques which facilitate a deductive op- 
erationalization and at the same time keep the level of 
required expertise within reasonable bounds. 

2 AMPHION 

Amphion [9] implements synthesis-based reuse of library 
subroutines. It translates a problem specification ex- 
pressed in logic into a Fortran program that solves the 
problem. The Fortran program at present is a straight- 
line program that makes calls to JPL’s NAIF library 
subroutines, which are in the domain of solar system 
geometry. Amphion/NAIF deals with problem specifi- 
cations of about 30 lines, generating about 100 lines of 
Fortran code. In a smaller experiment, Amphion has 
been applied to airplane gridding subroutines used in 
Computational Fluid Dynamics. Amphion’s translation 
process comprises three main phases. 

1. Rewrite the abstract specification containing ab- 
stract operations to a canonical set of abstract op- 
erations (e.g. angle-between-two-planes to angle- 
between-the- two-normal- vectors) 

2. Use deductive synthesis to solve the abstract speci- 
fication; this yields a term over concrete operators. 

3. Translate the concrete term to a level isomorphic 
to Fortran, whence it is printed out. 

The first two phases are performed together by SNARK, 
a resolution theorem prover developed at SRI by Mark 
Stickel [13]. 

The theorems used in deductive synthesis in the second 
phase above capture three kinds of facts. 

Ra(x,F a {z)) (1) 

Fa(A x (c)) = A 2 (F c (c)) (2) 

A x (c) = A 2 (convert^ (c)) (3) 

The first expresses solutions to constraints (1); e.g., that 
F a (c) is a solution for y to R 0 (c, j/). The second are ho- 
momorphisms that implement an abstract operator F a 
as a concrete operator F c (2). The third are conversion 
axioms that convert a concrete value from one represen- 
tation of an abstract value into another representation 
(3). 

In SNARK, control of the deductive process is ef- 
fected by specifying a search strategy (we chose set-of- 
support), Recursive Path Orderings (RPO), and a func- 
tion that orders the goal clauses on the agenda. The 
RPO specifies whether an equation can be used as a 
rewrite rule, and whether a paramodulation (an equal- 
ity inference rule used in resolution theorem provers) 


should occur. However, Amphion loses completeness 
because it uses RPOs together with the set-of-support 
strategy. This means that it might, in theory, fail to 
synthesize a program for a given specification; in prac- 
tice, however, this has not been a problem. 

The deductive process generally proceeds in two steps: 
initially, the constraints are solved abstractly in terms of 
abstract operators using the first sort of fact; then con- 
crete operators are found that implement the abstract 
operators using the second sort of facts oriented left-to- 
right and the third sort of facts acting as glue between 
the second sort of facts. 

Concocting an agenda ordering function that ordered 
the tasks as described above was difficult, requiring rea- 
soning about the theorem proving process and the ax- 
ioms. Meta-Amphion [10, 15] was invented to deal au- 
tomatically with this problem. It finds subtheories that 
can be replaced by decision (satisfiability) procedures; 
it can be thought of as a theory compiler. The resulting 
system can replace several non-deterministic inference 
steps by a single deterministic computation step. This 
puts less reliance on the agenda ordering function. 

This makes it feasible for a domain expert (i.e., someone 
familiar with the subroutine library but not automated 
deduction) to write such axioms as (1,2,3). It is a design 
goal of Amphion that the people developing the library 
can extend Amphion to make the library more accessible 
to customers. The reason Amphion is successful is that, 
due to the use of deductive synthesis, each axiom can be 
reasoned about in isolation. A user interface enabling a 
domain expert to enter axioms is being built now. An 
RPO that orients an equation like (2) left-to-right can 
be generated automatically. 

For the automated reasoning about the correctness of 
axioms, a type checker was written that applied to the 
axioms and rewrite rules. We also experimented with 
running a completion procedure, but not thoroughly: 
we got lost trying to orient the equalities with which it 
came up. 

Translation Phase 

The translation phase applies several passes of 
correctness-preserving transformations to the answer 
term returned by SNARK. The following tasks are ac- 
complished: variables are introduced for subterms (this 
is required to pass data from one Fortran CALL state- 
ment to the next); parameters and dummy parameters 
are correctly assigned to subroutines that return mul- 
tiple values; and parameter passing conventions used 
by NAIF are implemented. The reason for the multi- 
pass architecture is separation of concerns. Because it 
is written as a transformation system, rather than a 
monolithic piece of code, we hope that it will be able 
to handle extensions, such as conditional statements; 
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however, this has not been tested. 

SNARK uses special-purpose formula data structures; 
the latter transformation phases use LISP lists. The 
latter phases were first written using Software Refin- 
ery, but were later transliterated into LISP. The Refine 
transformations became either LISP functions if they 
were complicated, e.g. rule schemas, or rewrite rules if 
they were simple enough. If a transformation is compli- 
cated, it is not analyzable, e.g. by the type checker. The 
following are instances of two rule schemas; the general 
schemas cannot be expressed as simple rewrite rules. 

(get-field 2 (tuple x y z) ) — > y 

(let ((x El) ...) bdy) 

— > 

(letp ((x . ..) suchthat (and (* x El) ... )) bdy) 
Tracing 

Tracing information was added to SNARK and the 
translation process to record where each rewrite and in- 
ference rule applied [16]. This is used to add comments 
to code as to which abstract variable a concrete variable 
corresponds. 

Abstractly, an explanation is a set of connections be- 
tween the Fortran program and the problem specifica- 
tion or parts of the domain theory. The explanation gen- 
eration process starts with the Fortran abstract syntax 
term, considered to be the root of a derivation tree, and 
traces backwards through the derivation to the specifi- 
cation and domain theory, considered to be the leaves 
of the derivation tree. Both the proof, resulting in an 
applicative answer term, and the trace of the program 
transformations, resulting in a Fortran abstract syntax 
tree, are part of this derivation tree. 

More formally, a derivation is a directed acyclic graph 
whose nodes are derivation and transformation steps 
and whose arcs encode the ” derived from” relation. 
Each derivation step is a triple (F, T, A), where A is an 
inference rule application, and F and T are the result- 
ing formula and answer term, respectively. Each appli- 
cation of a rule specifies: the inference rule that was ap- 
plied (one of transformation, demodulation, paramod- 
ulation, or resolution); the input formulae Fi , . . . , F n ; 
and the locations (path in the formulae) of the subterms 
to which the rule was applied. 

An explanation is an equality connecting locations in 
the Fortran abstract syntax term to constructs in the 
specification and domain theory, from which it was de- 
rived. Such an equality is computed in two steps. First, 
explanation equalities of each step of the derivation are 
extracted. Explanation equalities are equalities between 
locations in a formula F and locations in the parent for- 
mulae Fi, . . . , F n in the derivation. Second, an equality 
is computed between the relevant location in the For- 


tran abstract syntax term and locations in the leaves of 
the derivation. 

3 PROBABILISTIC NETWORKS 

The motivation for this work was to enable the quick 
development of small and efficient data analyzers that 
could be run on a Martian rover that uses a spectrome- 
ter to classify minerals. The relevant type of data analy- 
sis here is parameter estimation, where, given data that 
is drawn from a statistical model (i.e., a parameterized 
distribution), the task is to find the parameter values 
that maximize some measure of fitness of the data to 
the model. 

For such applications, experienced statisticians itera- 
tively experiment with different statistical models of the 
domain to be analyzed and with different algorithms to 
be employed to do the analysis. However, the implemen- 
taion of — even a prototype — data analysis program for 
each model is too expensive, which hinders the experi- 
mentation with different models and algorithms. More- 
over, turning these prototypes into realistic, space- and 
time-efficient code introduces another bottleneck. Thus, 
a program transformation system which translates a sta- 
tistical model into an efficient program (as suggested in 
[1]), would be valuable tool for the statistical data anal- 
ysis domain. 

We are currently developing the system PN (Probabilis- 
tic Networks) [2] which is intended to be such a tool. It 
uses a specification language which is based on Bayesian 
networks. This framework provides a way of deriving 
the full joint probability over variables, and various con- 
ditional and marginal probabilities as required to syn- 
thesize the analysis programs. Bayesian networks are a 
standard notation in the data analysis domain and have 
also been adopted for other successful data analysis sys- 
tems such as BUGS [14]. In the PN specification lan- 
guage the variables involved in the problem are defined 
together with their properties and inter-dependencies. 
For example, a variable can be declared as a constant, 
to depend deterministically on other variables, or to be 
distributed according to a distribution parameterized by 
other variables. 

The PN system generates pseudocode as an intermedi- 
ate stage before translation to Java. The pseudocode 
may embody efficient algorithm templates (e.g., clus- 
tering algorithms) but may also call routines for doing 
numerical optimization. 

The program synthesis process is triggered by a param- 
eter estimation problem statement which asks to find 
the values of some of the variables that optimize a given 
probability given data for other variables. The synthe- 
sis process itself transforms the network and the prob- 
lem statement. Subproblems are generated based on 
theorems about how the network can be decomposed. 
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Some of the problems can bo solved in closed form, 
some by calling numerical optimization or solution sub- 
routines, and some by using general purpose statistical 
algorithms; the transforms associated with these prob- 
lems return code fragments. These subproblems may 
also involve probability expressions which can be sim- 
plified based on further domain-specific theorems. Some 
other subproblems require further, recursive decompo- 
sition of the network; the associated transforms return 
simpler (network, problem statement)-pairs. 

All transformations are expressed as theorems in Horn- 
clause logics; this facilitates reasoning about their cor- 
rectness. For efficiency reasons, we use a Prolog- 
interpreter as a deductive execution machine (i.e., 
metaprogram); each transform becomes a rule which 
checks the preconditions of the underlying theorem and 
builds up pseudocode as it decomposes the network and 
problem specification. 

The final phase, which we have not finished, is a trans- 
formation from pseudocode to Java. Here, issues such 
as numerical stability and optimization need to be ad- 
dressed. 

4 JAVA PATHFINDER 

One of our goals is to apply and develop verification 
tools based on model checkers, theorem provers, and 
static analysis in general. A verification tool typically 
analyses a term in some language (a specification, a de- 
sign or a concrete program) with the purpose of deciding 
whether the term satisfies some given properties. This 
provides a complementary and higher level of debugging 
and V&V than traditional testing methods. The ben- 
efit becomes particularly apparent when dealing with 
non-deterministic systems, for example concurrent and 
distributed programs, where testing techniques typically 
cannot control the scheduling of interleaved processes. 
That is, using traditional testing, the successful run of 
a particular test suite will not be convincing due to the 
fact that the test run may have missed certain possible 
interleavings of the involved processes. A verification 
tool, such as a model checker, will typically explore all 
possible interleavings. 

Almost all existing verification tools, however, ana- 
lyze terms in some special purpose language particu- 
lar for that tool. We believe that verification tools will 
be accepted and used by NASA programmers only if 
they can analyze programs/designs/specifications writ- 
ten in the languages already used by these program- 
mers. Therefore we have started a general effort to 
bridge the gap between frequently used languages and 
these special purpose verification languages in order to 
benefit from the analytic engines underneath the lat- 
ter. In particular, we are developing a translator called 
Java PathFinder (JPF) [4] from the Java programming 


language to the Promela language of the Spin model 
checker. 

Spin [6] is a verification system that supports the design 
and verification of finite state asynchronous process sys- 
tems. Programs are formulated in Promela, which is a 
simple multi-threaded programming language with non- 
deterministic guarded commands. Processes communi- 
cate either via shared variables or via message passing 
through buffered channels. Properties to be verified are 
stated as assertions in the program or as formulae in 
the linear temporal logic LTL. The Spin model checker 
can automatically determine whether a program satis- 
fies a property, and in case the property does not hold, 
it generates an error trace. 

One of our efforts to formally verify a multi-threaded op- 
erating system for the Deep Space 1 spacecraft is doc- 
umented in [3]. The operating system is implemented 
in a multi-threaded version of Common LISP. The veri- 
fication effort consisted of hand-translating parts of the 
LISP code into the Promela language of Spin. A total 
of 5 errors were identified, a very successful result. It 
was, however, clear that hand- translating such amounts 
of code is impractical, and this motivated the translator 
described here. 

The Source Language 

The intended source language of the translation is Java 
as a general purpose programming language, rather 
than as a specialized World Wide Web applet program- 
ming language. A significant subset of Java version 1.0 
is currently supported by JPF: dynamic creation of ob- 
jects with methods and data such as integers, booleans, 
arrays and object references. Furthermore class inher- 
itance, dynamically created threads and synchroniza- 
tion primitives for modeling monitors (synchronized 
statements, and the wait and notify methods), excep- 
tions, thread interrupts, and most of the standard pro- 
gramming language constructs such as assignment state- 
ments, conditional statements and loops. However, the 
translator is still a prototype and misses some features, 
such as packages, overloading, method overriding, re- 
cursion, strings, floating point numbers, static variables 
and static methods. Finally, we do not translate the 
predefined class library. A well-formedness predicate is 
applied to the Java program before the translation to 
determine whether it falls in the subset of translatable 
programs. 

A Java program to be translated will contain temporal 
logic specifications stated as calls to special static meth- 
ods (like “Verify . assert (alarm == false)”) defined 
in a predefined class called Verify. When translating 
the Java program these calls will be translated into the 
assertion language of Promela or into LTL formulae. By 
defining the special Verify class and defining temporal 
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logic operators as Java methods (with possibly empty 
bodies since they mainly have importance for the trans- 
lation) we avoid extending the Java language with spec- 
i fieat ion cons t r 1 icts . 

The Translator 

The translator is written using Moscow ML (for parsing) 
and Common LISP for the translation. The result of the 
parsing is an abstract syntax tree represented as a LISP 
S-expression. A set of LISP macros using the object 
oriented features of LISP have been defined to allow 
pattern matching over this tree, since LISP does not 
have pattern matching. 

The translator is defined by recursive descent and pat- 
tern matching over the tree, printing Promela code di- 
rectly to an output file. Hence, the translation ' does 
not generate intermediate code - the translation is di- 
rect, source to target in one go. The generated code 
is, however, divided into C macro definitions and the 
Promela code itself (Promela code is allowed to contain 
calls of such macros, which will be expanded out). The 
macros typically model Java kernel operations such as 
applying synchronization locks on objects, throwing ex- 
ceptions and handling object references. The generated 
Promela code can therefore be regarded as belonging to 
an enriched Promela, augmented with these kernel op- 
erations, and the following macro expansion will then 
result in lower level Promela code. 

The translation functions all have access to the complete 
abstract syntax tree in addition to the portion of the 
tree they are respectively designed to transform. This 
is in contrast to maintaining a symbol table that grows 
as the tree is descended. Just keeping the tree around 
seemed simpler than maintaining a nearly isomorphic 
symbol table. 

The translator does not keep a record of the applied 
transformations (LISP functions). However, traceabil- 
ity between the source Java code and the target Promela 
code is essential when the Spin model checker generates 
an error trace caused by a broken temporal property. 
This error trace can be simulated in Spin, illustrating 
the steps that lead to the error. However, a simulation 
is needed at the Java source code level, and this requires 
a mapping from Spin error traces to corresponding Java 
error traces. We don’t have this mapping implemented 
for the moment. Instead, the Java program is allowed 
to contain calls of Verify ♦ print (...) statements that 
will cause printing on graphical message sequence charts 
when running the error trace in Spin. 

Correctness and Scalability 

The translator itself is a prototype experimenting with 
the idea of transforming a real programming language 
into a model checker language. Since this is a novel and 
challenging problem in itself we have not spent efforts 


to prove the translation correct. Our main goal is to see 
whether this approach can have practical importance. 
The translator is expected to be scalable to almost full 
Java 1.0. 

A major challenge, however, is whether model check- 
ing of the resulting target Promela program scales up 
to large programs. A 1000 line Java program has been 
analyzed using the tool, confirming the existence of a 
deadlock [5]. In order to handle substantially larger pro- 
grams one needs to apply transformations to the Java 
source program in order to remove details that are not 
important for the properties to be verified. Together 
with Stanford University and Kansas State University 
we are currently exploring techniques such as program 
abstraction, program slicing, and static analysis in gen- 
eral. Some of these techniques can be regarded as pro- 
gram transformation schemes that produce target pro- 
grams with a smaller state space than the source pro- 
grams. The intention is to apply such transformations 
on a Java program before the translation into Promela 
is applied. In a current project, a 13 K line satellite file 
communication protocol written in Java is being ana- 
lyzed. This work will identify useful abstraction tech- 
niques. 

An alternative solution to deal with scalability is com- 
positional model checking, where only smaller portions 
of code are verified at a time, and where the results 
are then composed to deduce the correctness of larger 
portions of code. Imagine for example that one has pro- 
grammed a Java class modeling a bounded buffer which 
is to be used in a multi threaded context. We can then 
analyze its correctness by “putting it in parallel” with 
an aggressive environment consisting of producer and 
consumer threads, and we cam then analyze the proper- 
ties of this small system. At least one of the errors in 
the Deep Space 1 operating system could have been 
found this way. 

A different issue is the treatment of dynamic object cre- 
ation. Currently the heap is modeled as a fixed size 
array. We axe studying how to model a dynamic heap 
and garbage collection in a model checker. Note that 
no alias analysis is done at this point. It will, however, 
play a role in optimizing the verification. 

5 CONCLUSIONS 

In this paper we have summarized a variety of trans- 
formation systems developed at NASA Ames for the 
purpose of automating aspects of software engineering. 
Our transformation systems lower the cost of developing 
software, lower the cost for maintaining software, lower 
the expertise required to develop software, and provide 
more effective ways to verify and validate software sys- 
tems. 

To achieve these goals, we employ a spectrum of trans- 
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format ion al technology, ranging from transformation 
systems organized as a compiler to deductive synthe- 
sis. The choice of transformation technology depends 
upon the characteristics of the source to target trans- 
lation problem. When the translation problem can 
be decomposed along the lines of syntactic constructs, 
then syntax-directed source-to-source transformation 
systems (non-optimizing compilers) are highly efficient 
and relatively easy to control. For the first generation 
of the Java PathFinder, this technology worked well; 
enabling rapid experimentation of different ways to em- 
bed Java syntactic constructs into Promela. When the 
translation problem is characterized by a sequence of 
intermediate forms, then a sequence of sets of trans- 
formations (applied to exhaustion at each stage) works 
well. As long as the number of transformations applied 
at each stage is small, then issues such as termination 
- i.e., a well defined ordering ensuring progress towards 
the next intermediate form - are manageable, but re- 
quire more expertise to control than compiler technol- 
ogy. The interaction of the transformation rules requires 
care to ensure correctness. Deductive synthesis technol- 
ogy offers the potential to develop transformation sys- 
tems not by defining ’how' constructs are transformed 
from source to target, but rather the declarative rela- 
tionship between source and target. For small domains, 
the deduction engine will find the right path for trans- 
forming from source to target by exploring the possibil- 
ities denoted by the declarative relationship. However, 
as the domain becomes larger, this exploration becomes 
combinatorially prohibitive. This leads to the goal of 
operationalizing a deductive synthesis system. 

Despite the effectiveness of these systems, we perceive 
barriers to the more wide-spread use of transformational 
technology. To break through these barriers, the same 
goals that apply to improving software development in 
general also need to be achieved to improving the de- 
velopment process for transformation systems: lower 
the cost of developing a transformation system, lower 
the cost to modify and extend a transformation system, 
lower the expertise required to develop a transformation 
system, and provide effective ways to verify and validate 
transformation systems. 
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